The Jinan Chinese Learner Corpus
نویسندگان
چکیده
We present the Jinan Chinese Learner Corpus, a large collection of L2 Chinese texts produced by learners that can be used for educational tasks. The present work introduces the data and provides a detailed description. Currently, the corpus contains approximately 6 million Chinese characters written by students from over 50 different L1 backgrounds. This is a large-scale corpus of learner Chinese texts which is freely available to researchers either through a web interface or as a set of raw texts. The data can be used in NLP tasks including automatic essay grading, language transfer analysis and error detection and correction. It can also be used in applied and corpus linguistics to support Second Language Acquisition (SLA) research and the development of pedagogical resources. Practical applications of the data and future directions are discussed.
منابع مشابه
Hedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملA corpus-based study of the alternating ditransitive verb TELL in native and Chinese learner English corpora
This corpus-based study compares the use of the alternating ditransitive verb TELL by native speakers and Chinese learners of English. The corpora used are the written sub-corpus of ICE-GB (the British component of International Corpus of English) and CLEC (the Chinese Learner English Corpus). CLEC consists of both lowand high-proficiency L2 learners’ writing. By incorporating corpus contrastiv...
متن کاملChinese Paraphrases Acquiring Based on Random Walk N Steps
Jun Ma, Yujie Zhang, Jinan Xu, Yufeng Chen (Beijing Jiaotong University, Beijing, 100044, China ) Abstract: Conventional “pivot” approach of acquiring paraphrases from bilingual corpus has limitations, where only candidated paraphrases within two steps are considered. In this paper, we propose a graph based model of acquiring paraphrases from phrases translation table. First, we describe a grap...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملStarting a Sentence in L2 German - Discourse Annotation of a Learner Corpus
Learner corpora consist of texts produced by second language (L2) learners. I We present ALeS Ko, a learner corpus of Chinese L2 learners of German and discuss the multi-layer annotation of the left sentence periphery notably the Vorfeld.
متن کامل